AITopics | robert feldt

Collaborating Authors

robert feldt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Dobslaw, Felix, Feldt, Robert, Yoon, Juyeon, Yoo, Shin

arXiv.org Artificial IntelligenceMar-1-2025

Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software, requiring new approaches to verifying correctness beyond simple output comparisons or statistical accuracy over test datasets. This paper presents a taxonomy for LLM test case design, informed by both the research literature, our experience, and open-source tools that represent the state of practice. We identify key variation points that impact test correctness and highlight open challenges that the research, industry, and open-source communities must address as LLMs become integral to software systems. Our taxonomy defines four facets of LLM test case design, addressing ambiguity in both inputs and outputs while establishing best practices. It distinguishes variability in goals, the system under test, and inputs, and introduces two key oracle types: atomic and aggregated. Our mapping indicates that current tools insufficiently account for these variability points, highlighting the need for closer collaboration between academia and practitioners to improve the reliability and reproducibility of LLM testing.

language model, robert feldt, variation, (14 more...)

arXiv.org Artificial Intelligence

2503.00481

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Sweden > Jämtland County > Östersund (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving

Kim, Jinhan, Ju, Jeongil, Feldt, Robert, Yoo, Shin

arXiv.org Artificial IntelligenceSep-7-2020

Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set of challenges that are significantly different from traditional development of safety critical software. The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation. Among these stages, training and evaluation are computation intensive while data collection and labelling are manual labour intensive. This paper shows how development of DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. The correlation allows us to predict model performance for inputs without manually labelling them. This, in turn, enables understanding of model performance, more guided data collection, and informed decisions about further training. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy. Furthermore, engineers can trade off cost savings versus the tolerable level of inaccuracy depending on different development phases and scenarios.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3368089.3417065

2006.00894

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback